DALI Lab Data Challenge 22W, Task 1: Exploration & Visualization

Jeremy Hadfield, 3.2.2022

Part 1: Visualization & Exploration

This is a dataset with economic statistics on most countries, compiled from multiple data sources - including the World Bank, regional databases like SEDLAC (Socio-Economic Database for Latin America & the Caribbean), UNECLAC, and EDLAC, academic papers, and more. It contains about 10,000 rows of non-null data for each column, and includes data about consumption, income, earnings, inequality (Gini coefficient), and more.

Question 1: How does population and GDP per capita vary across the world over time?

To start, we will plot the data on a world map to visualize the differences in population and GDP per capita in different parts of the world.

The animated visualization above is useful for understanding the differences in GDP per capita across the world. The brighter colors show that regions like the US and Canada, Europe, Japan, Korea, and Australia have significantly higher GDPs per capita than most of the rest of the world. However, the size of the bubbles shows that regions like China, India, and Indonesia comprise a far greater proportion of global population. Finally, the time slider at the bottom of the graph can be used to animate the data over time. This also allows us to see changes in the availability of data over time (e.g. GDP per capita data is not available until the 1990s, and Africa is very underrepresented in this dataset), revealing how this dataset is biased or flawed. The animation can also be used to visualize trends and changes in GDP and population over time.

Question 2: How does a country's region and population relate to household consumption?

Economic conditions change dramatically depending on the part of the world a country. It is common knowledge that consumers in regions like Americas and Europe spend more than consumers in regions like Sub-Saharan Africa. In this part, I visualize the data to test this assumption and determine which regions make up a larger share of global household consumption.

The data includes a column with labels on the estimaed quality of data. I thought it may be better to filter out low-quality or unknown-quality observations, but this would filter out about 34% of the observations. It would also bias the resulting data toward higher-income countries where better data is available. Thus, I did not filter the data by quality.

Above: A chart of total average per capita household consumption by region in 2010.

This chart was created by taking all the mean househould consumption per capita values, and then adding them together to get a total average consumption value for the entire region. This shows that Asia dominates in global consumer consumption, making up a huge percentage of the global total along with the Americas and Europe. However, this should be taken with a grain of salt, as Asia also has most of the world's population. See the graph below.

Above: A pie chart of global population composition.

The visualizations above support the conclusion that the Asia makes up the vast majority of global consumption, while the Americas and Europe make up most of the remainder. However, when accounting for population, the proportion of consumption in Asia no longer seems so dramatic. Also, this dataset is likely somewhat biased (it includes less data from Africa and Oceania, so these regions have lower populations than their actual values).

Question 3: How have Gini coefficients changed over time?

The Gini coefficient is a number meant to represent inequality within a group (like a nation). Here, I try to determine how the Gini coefficient varies over time, both globally and for specific groups of countries.

Above: A line plot, showing how average global Gini coefficients have changed over time.

This graph shows that collecting data on Gini coefficients began in earnest in around 1940. From there, global inequality as measured by Gini coefficients increased, peaking in around 1960. Inequality declined from the 1970s to late 1980s. Finally, global inequality stagnated around a Gini coefficient of around 40 from 2000 to 2020. This data is based on a global Gini coefficients, however, and inequality in some regions has likely increased (e.g. in the US). See below.

Above: a line plot, showing how Gini coefficients have changed over time by country

This chart is a modification of the above, but for a specific group of countries of interest. Gini-inequality is especially high in Brazil, but has been declining in Brazil since 2000. In contrast, Gini-inequality is extremely high and has no sign of declining in South Africa. The United States has hovered around a Gini coefficient of 40 since the 1950s, but in the last 10 years inequality has increased dramatically. China has experienced a significant swing in inequality, with its Gini coefficient decreasing to the extremely low value of 20 around the 1980s, but then increasing again to be on par with the United States by the 2020s. This may reflect privatization and the rise of state capitalism in China. Another notable factor in this chart is the skyrocketing inequality in Russia after the collapse of the Soviet Union in the 1990s.

Above: a line plot, showing how Gini coefficients of income groups have changed over time

Finally, the above shows how the Gini coefficients of countries in different income groups have changed over time. Somewhat surprisingly, high-income countries show the lowest average Gini coefficient, using a 3-year moving average. Upper middle income countries show the highest inequality, while low-income countries show extreme variability until the 2010s, where they reach a more stable Gini coefficient, averaging between 40 and 50.

Question 4: How does Gini coefficient vary by population and GDP?

Here, we attempt to determine how inequality as measured by the Gini coefficient is related GDP and population.

Above: how GDP per capita has changed over time for countries with populations over 50 million since 1990.

Most countries show increasing GDP per capita since the 1990s. The US has an especially high GDP per capita. A slight decrease in GDP per capita is also visible during the 2008-2009 global financial crisis and recession. The most significant and consistent incrase in GDP per capita is in China, which has increased its GDP per capita in 2011 dollars from 3,922 in 2005 to 14,146 in 2016 - a nearly 5x increase.

The scatterplot is hard to interpret, but does allow finding some interesting correlations. South Africa is a notable cluster, which has a much higher Gini coefficient than expected, higher than other countries for the same time period and comparable GDP per capita. Shifting the graph so that GDP per capita is in focus, with Gini index in the background, reveals that there is no overwhelmingly clear correlation between GDP and inequality. However, it does seem that there is a rough correlation, where Gini index declines with increasing GDP per capita. Countries like Brazil, Mexico, and South Africa have very high Gini coefficients for their GDP, and all have relatively low GDPs per capita. This data also allows one to explore in a more open-ended way, to identify specific years, countries, and Gini coefficients.

Above: A bubble chart of Gini coefficients over time, with bubble size determined by population and color determined by GDP per capita.

This chart allows us to see some trends more clearly. The line of the US is much brighter and more visible because of its higher GDP per capita, and we can see that US inequality has increased steadily, with 2017 as an outlier high-inequality year. All countries seem to be converging somewhat, toward Gini coefficients in the 40-50 range. The world's overall GDP per capita is visible here in brighter colors in later years, with China's increasing GDP per capita especially visible. Brazil's decreasing Gini coefficient and China's increasing inequality are also evident.